24 research outputs found

    A case study for cloud based high throughput analysis of NGS data using the globus genomics system

    Get PDF
    AbstractNext generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research

    Genome-wide multi-omics profiling of colorectal cancer identifies immune determinants strongly associated with relapse

    Get PDF
    The use and benefit of adjuvant chemotherapy to treat stage II colorectal cancer (CRC) patients is not well understood since the majority of these patients are cured by surgery alone. Identification of biological markers of relapse is a critical challenge to effectively target treatments to the ~20% of patients destined to relapse. We have integrated molecular profiling results of several “omics” data types to determine the most reliable prognostic biomarkers for relapse in CRC using data from 40 stage I and II CRC patients. We identified 31 multi-omics features that highly correlate with relapse. The data types were integrated using multi-step analytical approach with consecutive elimination of redundant molecular features. For each data type a systems biology analysis was performed to identify pathways biological processes and disease categories most affected in relapse. The biomarkers detected in tumors urine and blood of patients indicated a strong association with immune processes including aberrant regulation of T-cell and B-cell activation that could lead to overall differences in lymphocyte recruitment for tumor infiltration and markers indicating likelihood of future relapse. The immune response was the biologically most coherent signature that emerged from our analyses among several other biological processes and corroborates other studies showing a strong immune response in patients less likely to relapse

    The Long Noncoding RNA CCAT2 Induces Chromosomal Instability Through BOP1-AURKB Signaling

    Get PDF
    BACKGROUND & AIMS: Chromosomal instability (CIN) is a carcinogenesis event that promotes metastasis and resistance to therapy by unclear mechanisms. Expression of the colon cancer-associated transcript 2 gene (CCAT2), which encodes a long noncoding RNA (lncRNA), associates with CIN, but little is known about how CCAT2 lncRNA regulates this cancer enabling characteristic.METHODS: We performed cytogenetic analysis of colorectal cancer (CRC) cell lines (HCT116, KM12C/SM, and HT29) overexpressing CCAT2 and colon organoids from C57BL/6N mice with the CCAT2 transgene and without (controls). CRC cells were also analyzed by immunofluorescence microscopy, gamma-H2AX, and senescence assays. CCAT2 transgene and control mice were given azoxymethane and dextran sulfate sodium to induce colon tumors. We performed gene expression array and mass spectrometry to detect downstream targets of CCAT2 lncRNA. We characterized interactions between CCAT2 with downstream proteins using MS2 pull-down, RNA immunoprecipitation, and selective 2'-hydroxyl acylation analyzed by primer extension analyses. Downstream proteins were overexpressed in CRC cells and analyzed for CIN. Gene expression levels were measured in CRC and non-tumor tissues from 5 cohorts, comprising more than 900 patients.RESULTS: High expression of CCAT2 induced CIN in CRC cell lines and increased resistance to 5-fluorouracil and oxaliplatin. Mice that expressed the CCAT2 transgene developed chromosome abnormalities, and colon organoids derived from crypt cells of these mice had a higher percentage of chromosome abnormalities compared with organoids from control mice. The transgenic mice given azoxymethane and dextran sulfate sodium developed more and larger colon polyps than control mice given these agents. Microarray analysis and mass spectrometry indicated that expression of CCAT2 increased expression of genes involved in ribosome biogenesis and protein synthesis. CCAT2 lncRNA interacted directly with and stabilized BOP1 ribosomal biogenesis factor (BOP1). CCAT2 also increased expression of MYC, which activated expression of BOP1. Overexpression of BOP1 in CRC cell lines resulted in chromosomal missegregation errors, and increased colony formation, and invasiveness, whereas BOP1 knockdown reduced viability. BOP1 promoted CIN by increasing the active form of aurora kinase B, which regulates chromosomal segregation. BOP1 was overexpressed in polyp tissues from CCAT2 transgenic mice compared with healthy tissue. CCAT2 lncRNA and BOP1 mRNA or protein were all increased in microsatellite stable tumors (characterized by CIN), but not in tumors with microsatellite instability compared with nontumor tissues. Increased levels of CCAT2 lncRNA and BOP1 mRNA correlated with each other and with shorter survival times of patients.CONCLUSIONS: We found that overexpression of CCAT2 in colon cells promotes CIN and carcinogenesis by stabilizing and inducing expression of BOP1 an activator of aurora kinase B. Strategies to target this pathway might be developed for treatment of patients with microsatellite stable colorectal tumors

    Author Correction: Federated learning enables big data for rare cancer boundary detection.

    Get PDF
    10.1038/s41467-023-36188-7NATURE COMMUNICATIONS14

    Federated learning enables big data for rare cancer boundary detection.

    Get PDF
    Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing

    Federated Learning Enables Big Data for Rare Cancer Boundary Detection

    Get PDF
    Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing

    viGEN: An Open Source Pipeline for the Detection and Quantification of Viral RNA in Human Tumors

    No full text
    An estimated 17% of cancers worldwide are associated with infectious causes. The extent and biological significance of viral presence/infection in actual tumor samples is generally unknown but could be measured using human transcriptome (RNA-seq) data from tumor samples. We present an open source bioinformatics pipeline viGEN, which allows for not only the detection and quantification of viral RNA, but also variants in the viral transcripts. The pipeline includes 4 major modules: The first module aligns and filter out human RNA sequences; the second module maps and count (remaining un-aligned) reads against reference genomes of all known and sequenced human viruses; the third module quantifies read counts at the individual viral-gene level thus allowing for downstream differential expression analysis of viral genes between case and controls groups. The fourth module calls variants in these viruses. To the best of our knowledge, there are no publicly available pipelines or packages that would provide this type of complete analysis in one open source package. In this paper, we applied the viGEN pipeline to two case studies. We first demonstrate the working of our pipeline on a large public dataset, the TCGA cervical cancer cohort. In the second case study, we performed an in-depth analysis on a small focused study of TCGA liver cancer patients. In the latter cohort, we performed viral-gene quantification, viral-variant extraction and survival analysis. This allowed us to find differentially expressed viral-transcripts and viral-variants between the groups of patients, and connect them to clinical outcome. From our analyses, we show that we were able to successfully detect the human papilloma virus among the TCGA cervical cancer patients. We compared the viGEN pipeline with two metagenomics tools and demonstrate similar sensitivity/specificity. We were also able to quantify viral-transcripts and extract viral-variants using the liver cancer dataset. The results presented corresponded with published literature in terms of rate of detection, and impact of several known variants of HBV genome. This pipeline is generalizable, and can be used to provide novel biological insights into microbial infections in complex diseases and tumorigeneses. Our viral pipeline could be used in conjunction with additional type of immuno-oncology analysis based on RNA-seq data of host RNA for cancer immunology applications. The source code, with example data and tutorial is available at: https://github.com/ICBI/viGEN/

    Data_Sheet_2_viGEN: An Open Source Pipeline for the Detection and Quantification of Viral RNA in Human Tumors.DOCX

    No full text
    <p>An estimated 17% of cancers worldwide are associated with infectious causes. The extent and biological significance of viral presence/infection in actual tumor samples is generally unknown but could be measured using human transcriptome (RNA-seq) data from tumor samples. We present an open source bioinformatics pipeline viGEN, which allows for not only the detection and quantification of viral RNA, but also variants in the viral transcripts. The pipeline includes 4 major modules: The first module aligns and filter out human RNA sequences; the second module maps and count (remaining un-aligned) reads against reference genomes of all known and sequenced human viruses; the third module quantifies read counts at the individual viral-gene level thus allowing for downstream differential expression analysis of viral genes between case and controls groups. The fourth module calls variants in these viruses. To the best of our knowledge, there are no publicly available pipelines or packages that would provide this type of complete analysis in one open source package. In this paper, we applied the viGEN pipeline to two case studies. We first demonstrate the working of our pipeline on a large public dataset, the TCGA cervical cancer cohort. In the second case study, we performed an in-depth analysis on a small focused study of TCGA liver cancer patients. In the latter cohort, we performed viral-gene quantification, viral-variant extraction and survival analysis. This allowed us to find differentially expressed viral-transcripts and viral-variants between the groups of patients, and connect them to clinical outcome. From our analyses, we show that we were able to successfully detect the human papilloma virus among the TCGA cervical cancer patients. We compared the viGEN pipeline with two metagenomics tools and demonstrate similar sensitivity/specificity. We were also able to quantify viral-transcripts and extract viral-variants using the liver cancer dataset. The results presented corresponded with published literature in terms of rate of detection, and impact of several known variants of HBV genome. This pipeline is generalizable, and can be used to provide novel biological insights into microbial infections in complex diseases and tumorigeneses. Our viral pipeline could be used in conjunction with additional type of immuno-oncology analysis based on RNA-seq data of host RNA for cancer immunology applications. The source code, with example data and tutorial is available at: https://github.com/ICBI/viGEN/.</p
    corecore